Apparatus and method for partial execution blocking of instructions following a data cache miss

ABSTRACT

A partially blocking data cache having improved microprocessor performance while maintaining data consistency between external memory and cache memory. The data cache of the present invention is used in a computer system and is partially blocking in that this cache will block the execution of any store instructions subsequent to an outstanding load instruction that missed the cache. The present invention offers increased microprocessor efficiency by allowing execution of subsequent load instructions while less than a predetermined number of preceding load instructions are still outstanding. The present invention utilizes a counter within the data cache unit to track the number of outstanding load misses. The present invention provides increased performance without undue or overly complex modifications to existing caching systems. The present invention operates advantageously within a computer system having a relatively large number of registers associated with the microprocessor such that store instructions represent a relatively small number of the instructions executed by the microprocessor.

BACKGROUND OF THE INVENTION

(1) Field of the Invention

The present invention relates to the field of memory accessingtechnology for the storage and retrieval of data and/or instructionsfrom a memory unit. Specifically, the present invention relates to thefield of memory cache technology employed to increase the speed andefficiency of memory accessing.

(2) Prior Art

Among the many elements of a computer system are found a centralprocessing unit (CPU), memory storage units (which may be RAM or ROM orother), and interface and control logic units that couple the CPU to thememory storage units for accessing and processing of data andinstructions. Generally, each time the CPU processes an instruction itmust access a memory storage unit to gain desired data for processing orto obtain the execution instruction itself. For whatever reason, the CPUis constantly interfacing with the memory storage units. Recentdevelopments in computer technology have offered a variety of differenttypes of memory storage units which are adapted for different functionsand have different characteristics. Specifically, use of a data cachememory unit and associated logic has become extremely popular because ofthe versatility and efficiency of data accessing offered by the datacache memory.

A data cache is a special purpose memory unit that is designed forspecial interface with the CPU. The data cache is typically a smallsized specially designed memory unit designed for high speed access tothe CPU of a microprocessor. Typically the data cache is of limited sizebecause of the constraints of interfacing the cache with the CPU. Thecache memory is designed to specially interface with the CPU so that theCPU can access the data of the cache at very high speeds verse therelative long data accessing required of other, external, memory units.Many cache units are located structurally within the chip containing themicroprocessor unit for high speed accessing. A cache is filled withdata that the CPU will probably execute on a routine or cyclic basis.This data is placed into the cache memory from the external memory (orgenerated by the CPU and placed into the cache memory) typically duringthe execution steps of a program. That is, the most recently used data,determined by monitoring the data flow through the program execution, isplaced into the cache memory. New data is placed or replaced into thecache and tagged for identification while older data (i.e., data notaccessed over a given time period) is slowly "aged out" or removed fromthe cache. The memory placed within the cache is also tagged with aunique identifier that is related to the effective memory addressassociated with the data of the external memory unit.

During program execution when the CPU desires to access (i.e. load orstore) data to a particular address within the external memory unit, aspecial cache logic unit first scans the contents of the cache memoryunit to determine if the desired address is located within the highspeed cache. If so, the data is accessed via the cache utilizing the tagidentifier and the position of the data within the cache. In this caseexternal memory access is not required and therefore the delayassociated with external memory access is avoided. Each time data isaccessed via the cache a significant amount of processing time is savedby avoiding the delay associated with the external memory. Therefore,memory cache operations or "cache operations" refer to the cacheprocedure and theory discussed above. Cache operations function on thetheory that many computer programs frequently utilize only a relativelysmall number of data addresses on a cyclic basis and those commonly usedvalues will end up located within the high speed cache memory providingefficient access.

In the event that the desired data is determined to be not within thedata cache, the cache logic unit will indicate that a "cache miss" hasoccurred associated with the access instruction (i.e., a miss load or amiss store) and the instruction causing the miss is called the missedinstruction. When a cache miss occurs, the desired data must be accessedfrom, or to, the external memory which is usually not associated withthe structural location of the cache memory unit. This accessing to theexternal memory takes longer than a cache memory access. During thedelay, the many prior art CPUs may not issue further instructions whilethe address in external memory is being accessed associated with themissed instruction due to problems of data inconsistency. These furtherinstructions are called subsequent instructions to the missedinstruction.

A prior art cache system is illustrated in the block diagram of FIG.1.0. The external memory unit 60 is illustrated coupled to interfacecontrol unit 14 which provides controlled access to the external memory60. A high speed limited size cache unit 10 is illustrated coupled tothe logic unit 14. The high speed cache unit is coupled to amicroprocessor instruction processor 50 via a cache control unit 12which controls accessing to the cache between microprocessorinstructions and determines whether or not data associated with themicroprocessor instructions resides in cache or not. The microprocessorinstruction processor 50, the logic unit 12 and the high speed cache 10are all located within the chip of the microprocessor 5. Because of thislocation, and other special characteristics, the cache memory 10 allowshigh speed, efficient interface to the microprocessor. After aninstruction generating a cache miss is encountered, the instruction isexecuted through the external memory 60. When the desired data isobtained via the logic unit 14, it is forwarded to the microprocessorunit 50 for processing. The data is also placed into the cache 10 andtagged for subsequent use.

Prior Cache Systems

Prior cache systems are either totally microprocessor blocking ornon-blocking. In a totally blocking system, each time there is a missinstruction every subsequent instruction, either a load or a storeinstruction, must be suspended until the miss instruction is completelyexecuted (i.e., until external memory 60 is accessed). This is done bystalling or blocking execution of the microprocessor and must be done toprevent data inconsistencies between the subsequent instructions and themiss instructions. Therefore, after a load instruction generating acache miss load instruction all subsequent load instructions will bestalled until the cache miss load instruction is fully executed throughexternal memory. Obviously, this implementation does provide for dataconsistency but at a reduced operational speed due to the large numberof stalls generated for each cache miss state. This prior art systemdoes not differentiate between the character of the subsequentoperations to determine if they are dependent or independent on theresult of the missed instruction or if the subsequent instructions maybe allowed to execute out of order without causing data inconsistencies.This totally blocking prior art system merely prohibits execution of anyinstruction following a miss instruction that has not completedexecution (i.e., while a miss instruction is currently outstanding).These prior art systems are not advantageous because processingperformance of the microprocessor is degraded due to the continualblocking of instructions on the occurrence of miss instructions;therefore, these systems are not very efficient. What is needed is asystem that offers improved system efficiency while protecting dataconsistency and integrity. What is needed is a system that allows somesubsequent instructions to operate using the cache even though aprecedent and outstanding instruction has not been totally executed. Thepresent invention offers such capability.

Other prior art cache systems are non-blocking in that they never blocksubsequent instructions. From a performance standpoint these systemsOperate very rapidly and efficiently. However, these systems employextremely complex and advanced circuitry to insure data consistencyduring the operation of subsequent instructions since some instructionswill be executed out of order. This complex circuitry requires highperformance and often expensive components that are not appropriate forall systems and designs. Moreover, it is very difficult and complex tointegrate these data cache units into conventional microprocessordesigns. What is needed is a system that protects data consistency,provides for efficient operation but yet does not require overly complexdesign or implementation and may be integrated into conventionalmicroprocessor design with modest modifications. The present inventionoffers such capability.

In view of the above, it is an object of the present invention toprovide a method and apparatus for providing an efficient system ofmemory caching by employing a partially blocking data cache scheme. Itis an object of the present invention to implement the partiallyblocking data scheme without overly complex and advanced circuits sothat the partially blocking data scheme can be implemented into existingcaching systems without undue expense or modification. It is further anobject of the present invention to provide the above advantages in asystem that absolutely insures data consistency. A further object of thepresent invention is to provide a very efficient partially blockingcaching system which can be advantageously utilized by a microprocessorhaving a relatively large number of registers with very little loss ofperformance.

Another object of the present invention is to provide processingefficiency and data consistency by allowing some subsequent instructionsof a first type to be executed during the time while a precedentinstruction remains outstanding but yet stalling other subsequentinstructions of a second type during the same period. It is appreciatedthat other objects of the present invention not specifically enumeratedherein will become apparent in discussions to follow.

SUMMARY OF THE INVENTION

The present invention is drawn to a system, apparatus and method for apartially blocking data cache. As such the present invention includes apartially blocking cache apparatus for use with a microprocessor, saidmicroprocessor processing a plurality of instructions each havingassociated data, the apparatus comprising: cache memory array for highspeed memory accessing with the microprocessor; first processing meansfor executing preceding cache miss load instructions each having anassociated data address that is not accessible by the cache memoryarray; and stalling means for temporarily preventing the microprocessorfrom executing a subsequent store instruction while one or more of thepreceding cache miss load instructions are pending before the firstprocessing means, the stalling means also for allowing themicroprocessor to execute a subsequent load instruction while less thana predetermined number of the preceding cache miss load instructions arepending before the first processing means.

The present invention further includes a partially blocking cacheapparatus as described above with storage means for indicating thenumber of the preceding cache miss load instructions that are pendingbefore the first processing means, the storage means communicativelycoupled to the stalling means and also communicatively coupled to thefirst processing means; and miss indicating means for generating a missindication associated with each cache miss load instruction, the missindicating means coupled to the storage means and wherein the firstprocessing means comprises completion means for indicating to thestorage means that a cache miss load instruction is completely processedby the first processing means and is therefore no longer pending beforethe first processing means.

The present invention further includes a partially blocking cacheapparatus as described above wherein the storage means is a logicalcounter having an increment input and a decrement input and wherein themiss indicating means increments the logical counter upon each missindication, the miss indicating means coupled to the increment input ofthe logical counter; and wherein the completion means decrements thelogical counter upon each cache miss load instruction that is completelyprocessed by the first processing means, the completion means coupled tothe decrement input of the logical counter.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an illustration of a prior art system employing a prior artdata cache system.

FIG. 2 illustrates a program instruction sequence and an associatedtiming diagram to illustrate the instruction sequences of the presentinvention, including a stalling period.

FIG. 3A and FIG. 3B illustrates block diagrams of two overall systemembodiments of the present invention data cache unit.

FIG. 4 is a general block diagram of the present invention illustratingthe data cache unit, the bus control logic (both with themicroprocessor) and the external memory.

FIG. 5 is a detailed block diagram of the data cache unit and othercomponents of the present invention.

FIG. 6 is a flow diagram of the major processing functions of thepresent invention.

DETAILED DESCRIPTION OF THE INVENTION

The present invention may operate within the environment of aconventional data cache unit of a microprocessor or a microprocessorsystem. It is appreciated that certain details of such a cache unit thatare well known in the art of microprocessor technology and architectureare not described in detail as to not unnecessarily obscure theinventive features of the present invention. In the following detaileddescription of the present invention numerous specific details are setforth in order to provide a thorough understanding of the presentinvention. However, it will be obvious to one skilled in the art thatthe present invention may be practiced without these specific details.In other instances well known methods have not been described in detailas not to unnecessarily obscure the present invention.

The present invention includes an apparatus and method for a partiallyblocking data cache in a computer system. In the present invention, theinstruction execution by the microprocessor is suspended for any storeinstruction which may follow a cache missed load instruction but allowsexecution of some subsequent load instructions following the cachemissed load instruction. This suspension or "stalling" of the storeinstruction is performed until the data for each missed load instructionhas been received from the external memory unit and no further missedload instructions are pending. The store instruction may then beexecuted. The purpose of stalling the store instruction when there areunfinished miss load instructions is to provide consistency within thememory structure and cache unit of computer system. For instance, if anunfinished load instruction is on the instruction queue, then thedesired contents of the address for the load instruction have not yetbeen read from external memory. If a subsequent store instruction isallowed to execute before the missed load instruction completes then thestore instruction may overwrite the desired data in the external memory(with the data associated with the store instruction at that sameaddress location) before the data has a chance to be read from theexternal memory for the load instruction. Furthermore, the presentinvention cannot allow a store instruction to write data into the cachememory while outstanding loads are pending because a preceding yetincomplete and outstanding load instruction may overwrite the store datain the cache upon filling the cache with data from the external memory.Obviously, either result is undesirable as valid and desired data wouldbe overwritten and lost causing data inconsistency.

The present invention advantageously allows efficient processing of themicroprocessor by allowing multiple cacheable load instructions to beprocessed after the occurrence of a cache miss load instruction. Priorart systems block the data cache memory 55 and stall the microprocessorfor any instructions subsequent to a cache miss associated with a loadinstruction until all the pending load instructions are completelyprocessed and the external memory is accessed. However, the presentinvention advantageously does not degrade microprocessor performance byautomatically stalling the instruction processor after a loadinstruction having a cache miss is encountered. Rather, the presentinvention allows processing of load instructions which follow a cachemiss instruction.

One theory of operation of the present invention is that the dataassociated with a cache miss instruction and the data associated with asubsequent cache load instruction having data in the cache cannot befrom the same data address. That is, the associated data of these loadinstructions must have separate address because one generated a cachemiss while the other did not. Since the data addresses must be differentthere can be no data inconsistency by allowing these load instructionsto operate out of order or in parallel. Therefore, the present inventionallows the subsequent cacheable load instruction to execute while apreceding yet outstanding cache miss load instruction is still pending.If the subsequent load instruction was directed to the same data addressas the outstanding load instruction then this subsequent instructionwould also become outstanding (because it would generate a cache miss)and it would be sent to the BCL 56 for processing; and according thepresent invention this subsequent instruction would have to wait for theexecution of the preceding outstanding load cache miss instruction bythe BCL 56. Again, no data inconsistency would occur. But, in view ofthe reasons stated above, in no case may a store instruction be allowedto operate following a preceding and outstanding load miss.

Therefore, in an effort to increase processing efficiency and speed, thepresent invention allows a predetermined number of load instructions toexecute while a preceding load miss is outstanding. In an effort tomaintain data consistency, the present invention stalls each cacheablestore instruction while any load misses are outstanding. The presentinvention achieves the above result by utilizing a minimum of hardwaremodifications to existing cache based computer systems and thereforeoffers an efficient apparatus and method for increasing the accessingspeed of a computer system while insuring data consistency therein.

The present invention operates on a theory that some of the subsequentinstructions executed by the microprocessor during an outstanding missinstruction do not require the result of the preceding missedinstruction and are independent of the data associated with the missedinstruction; also, some subsequent instructions may be allowed tooperate without interfering with the result of the outstandinginstruction and vise-versa. Therefore, these subsequent instructionsshould be allowed to execute completely (using the cache memory 55)although the missed instruction is not yet completed (due to the delayin accessing the external memory). Other instructions, if allowed tooperate, either depend on the data related to the missed instructions orhave the possibility of interfering with the result of the missedinstruction or the miss instruction may interfere with the result of thesubsequent instruction when the miss instruction is executed. Thesesubsequent instructions must be suspended from execution until theaccessing is complete for the missed instruction. If not, the cache datawill not be consistent with the external memory. In these cases thepresent invention will suspend the microprocessor to halt the executionof the subsequent instruction in order to prevent inconsistencies withinthe data cache and external memory. The cache logic unit will generate ablock signal to the microprocessor to block the execution of anysubsequent instructions in situations that may cause datainconsistencies. Since the cache logic unit allows some subsequentinstructions to operate and blocks the operation of others, it isreferred to as a partially blocking data cache.

For example, FIG. 2 illustrates a timing diagram of a typical executionsequence of program code associated with program 30 used with thepresent invention. Microprocessor instructions 1, 2, 3, to n areexecuted by the microprocessor in sequence in performance of the program30. The timing diagram illustrates the time line and each line indicatesthe processing period for each instruction. As can be seen, line 32indicates that instruction 1 is executed first at t1. Instruction 1 is acache miss instruction, that is, the data required for execution ofinstruction 1 is not found within the data cache. The line 32 representsthe total period required to access and process the required addressassociated with the data of instruction 1, from t1 to t4. Forillustration, instruction 1 is a load instruction of address x0.Instruction 2 is a load instruction of address x1 and the result of thisinstruction is not determinative on the result of instruction 1 nor willexecution of instruction 2 interfere with execution of instruction 1 orvice-versa. Also, the data required for instruction 2 is located withinthe data cache. Therefore, instruction 2 is allowed to execute by thepresent invention at t2 to t3 during the time that instruction 1 isexecuting or still pending. As seen by line 33, the processing periodfor instruction 2 is very short in part due to the rapid accessing timeof the data cache. The present invention advantageously allowsinstruction 2 to execute during the delay caused by instruction 1'sexternal memory access.

Also at t3 of FIG. 2 the microprocessor attempts to execute instruction3 which is a store instruction into address x0, however data cannot bestored within this address in external memory because the loadinstruction, instruction 1, is attempting to read data at this locationin external memory and has not yet completed as of t3. If the storeoperation (instruction 3) where allowed to operate out of order (i.e.,before the completion of instruction 1) it would overwrite the data ofaddress x0 before the data was read by instruction 1. Also, data cannotbe stored into the cache memory corresponding to this location by thestore instruction before the precedent load instruction has completed.This is the case because the store instruction may store data within acache location that becomes overwritten upon the completion of theprecedent load instruction. In either case, the store instruction mustbe delayed by the present invention to maintain data consistency betweenexternal memory 60 and the cache memory 55. Therefore, as line 34indicates, from t3 to t4 the execution of instruction 3 is suspended bythe present invention until the desired data is accessed from x0 andinstruction 1 is completed at t4. During the period indicated by line 34the cache logic unit of the present invention is asserting a blocksignal which is received by the microprocessor unit to suspend executionof the subsequent instruction 3 while instruction 1 is executing. Fromt4 to t5 instruction 3 can then execute and the program 30 continues toinstruction n, etc.

By allowing instruction 2 to operate during the operational period ofinstruction 1, the present invention has increased in processinginefficiency over total blocking cache systems. The present inventionhas also maintained data consistency by blocking execution of the storeinstruction until the outstanding load miss instruction 1 is complete.It should be noted that if another load miss instruction should beencountered before the outstanding load miss instruction (instruction 1)is complete, the new load miss instruction will not be blocked per se.It will be sent to the bus control logic 56 and placed onto a queue tobe executed in turn after execution of instruction 1. As will be morefully discussed below, the bus control logic 56 has the capability tore-execute these instructions placed on its queue 57.

Overall System and Environment of the Present Invention

The overall environment, or system, in which the preferred embodimentoperates is first described. In general, digital computer systems usedby the preferred embodiment of the present invention as illustrated inblock diagram format in FIG. 3A, comprise a bus 100 for communicatinginformation between the elements of the system, a central processor, 101coupled with the bus for processing information and instructions, arandom access memory 102 coupled with the bus 100 for storinginformation and instructions for the central processor 101, a read onlymemory 103 coupled with the bus 100 for storing static information andinstructions for the processor 101, a data storage device 104 such as amagnetic disk and disk drive coupled with the bus 100 for storingprogram information and instructions, a display device 105 coupled tothe bus 100 for displaying information to the computer user, analphanumeric input device 106 including alphanumeric and function keyscoupled to the bus 100 for communicating information and commandselections to the central processor 101, a cursor control device 107coupled to the bus for communicating user input information and commandselections to the central processor 101, and a signal generating device108 coupled to the bus 100 for communicating command selections to theprocessor 101.

It should be noted that some environments of the present invention maycontain all or merely a portion of the above components. For example, ifthe present invention was implemented within a control device for anadvanced photocopier or a laser printer, the input device would be akeyboard at the copier control or a signal input for the laser printer.The display would be the common LCD display of the photocopier to theuser or an LED indicator light. Of course, RAM and ROM devices arepresent within even simple systems. The signal generation device wouldinclude the logic required to detect and indicate copier/printermalfunctions, etc. It is also appreciated that throughout thisdiscussion, reference may be made to data in a memory or the processingof data; it should be understood that the term "data" means both datathat may be processed by an instruction as well as the instructionitself, i.e., the bytes that comprise a particular opcode. Therefore,data should not be construed as merely representing program data butalso as representing program instructions.

Referring still to FIG. 3A, the display device 105 utilized with thecomputer system and the present invention may be a liquid crystaldevice, cathode ray tube, a simple LED indicator light, or other displaydevice suitable for creating graphic images and alphanumeric charactersrecognizable to the user. The cursor control device 107 allows thecomputer user to dynamically signal the two dimensional movement of avisible symbol (pointer) on a display screen of the display device 105.Many implementations of the cursor control device are known in the artincluding a trackball, mouse, joystick or special keys on thealphanumeric input device 105 capable of signaling movement of a givendirection or manner of displacement. It is to be appreciated that thecursor means 107 also may be directed and/or activated via input fromthe keyboard using special keys and key sequence commands.Alternatively, the cursor may be directed and/or activated via inputfrom a number of specially adapted cursor directing devices, includingthose uniquely developed for the disabled. In the discussions regardingcursor movement and/or activation within the preferred embodiment, it isto be assumed that the input cursor directing device or push button mayconsist any of those described above and specifically is not limited tothe mouse cursor device. Some systems encompassing the present inventionmay not include a cursor control device.

Referring to FIG. 3A, the data cache memory 55 and associated data cacheunit 52 and bus control unit of one embodiment of the present inventionare logically associated with block 109 and interface to the bus 100 byway of the bus control logic. The data cache unit 52 interfaces with theprocessor 101 via bus 100. The bus control logic 56 interfaces withexternal memory units 102 and 103 via bus 100. It is appreciated thatalthough FIG. 3A illustrates the data cache memory and associated datacache unit and bus control unit as separate from the microprocessor,alternative embodiments of the present invention integrate these logicalunits together with the microprocessor 101. To this extent, the diagramillustrated in FIG. 3A should be construed as a functional block diagramand not necessarily a structural block diagram representing the presentinvention.

FIG. 3B illustrates the preferred embodiment of the present inventionwhich locates the data cache unit 52 and the bus control logic 56 withinmicroprocessor 101. The bus control logic 56 is interfaced with the bus100 and with the data cache unit 52. External memory 60 is showninterfaced with the bus 100. External memory may be composed of RAM 102or ROM 103 or both. This memory 60 is "external" in so far as it isexternal to the microprocessor unit 101. It is appreciated that alsolocated within microprocessor 101 is an instruction processor thatexecutes the instructions of the program code; this instructionprocessor is interfaced with the DCU 52. The other components of FIG. 3Bare analogous to those described with respect to FIG. 3A and referenceis called those discussions above.

Overall Design of the Preferred Embodiment of the Present Invention

FIG. 4 illustrates a block diagram of the components of the preferredembodiment of the present invention. Aside from the external memoryblock 60, the elements of the preferred embodiment of the presentinvention are located within the microprocessor block 101 as illustratedby the dashed line. Block 52 is the Data Cache Unit or DCU whichcontrols the accessing and processing of the data cache memory 55 whichis included within the DCU. The DCU 52 interfaces with the instructionprocessor 50 unit of the microprocessor via interface lines 68 and 66.The instruction processor 50 is the logical portion of themicroprocessor that controls the execution of instructions within themicroprocessor 101. When a stall or block signal is generated by the DCU52, the instruction processor stalls or halts execution of subsequentinstructions while the cache blocking signal 66 is asserted. Within theinterface line 66 is the blocking signal which will suspend execution ofsubsequent instructions upon the block signal asserted. Interface 68carries an identification of the currently executed instructions whichare fed to the DCU 52 by the microprocessor instruction processor 50.The DCU 52 requires this information since it will respond differently(i.e., stall or not stall) based on different instruction types.

The DCU 52 is coupled to internal bus 54 (which is not the bus 100) andto the Bus Control Logic (BCL or also called Bus Control Unit) 56 viainterface lines 80 and 82. Control signals and other signals, which willbe discussed more fully below, are transmitted between the BCL 56 andthe DCU 52. When the instruction decoder 200 (of FIG. 5) is located offof the DCU 52 then the load and store signals 300 and 310 arecommunicated over internal bus 54. The external memory 60 (which may becomposed of RAM 102 or ROM 103 or both) is coupled to the BCL 56 viaexternal bus 100 but not directly coupled to the DCU 52. The DCU 52 willdetermine when a block signal should be generated to suspend executionof subsequent instructions by the microprocessor's instruction processor50. The DCU 52 will also determine whether or not a cache miss occurredassociated with a load instruction. The BCL 56, on the other hand,re-executes the cache miss load instructions. The BCL 56 will alsoindicate when an outstanding instruction (cache miss) has beencompletely executed. An instruction queue 57 is located within the BCL56 for holding multiple pending cache miss instructions.

The DCU 52 is a one kilo byte, direct mapped, write through cache unit.When the DCU 52 does not have the data within cache memory 55 that aload instruction is asking for, a cache miss occurs and the BCL 56 willbe notified to fetch the data from external memory 60. As the datareturns from external memory 60, the BCL 56 instructs the DCU 52 toaccess the obtained data and update or replace its data cache entrywithin the data cache memory 55. The DCU replacement policy supports upto 16 bytes (quad word or 128 bits) replacement in a multiple of 4 bytesquantity for load instructions. However, for store instructions, it cansupport up to 16 bytes replacement down to a multiple of 1 bytequantity. In other words, the maximum replacement line size of the DCUis 16 bytes in quantity. The DCU 52 is considered a single cycle memoryco-processor which returns data for normal load and store instructionsin one clock if the data is already available in the cache. The DCU 52is able to handle bit endian as well as little endian memory accesses.

It is appreciated that the BCL 56 assumes that every load instructionissued by the instruction decoder 200 is going to be processed by theDCU 52. However, the BCL 56 does trap all the necessary informationassociated with every load instruction so that the BCL 56 can itself(not the instruction decoder 200) reissue the load instruction thatmisses the DCU (cache) and execute the load instruction. Therefore, eachtime a load instruction misses the cache there will one clock cycle lossof time. It is further appreciated that the DCU 52 generates a blockingsignal over line 66 in order to stall or halt processing of themicroprocessor unit 50 instruction processor when a store instructionfollows a pending load instruction. Most microprocessors are equippedwith a halt signal pin or a stall opcode which allows themicroprocessor's clock to cycle without any instruction processingperformed by the microprocessor instruction processor 50. That is, for agiven time period the instruction processing capability of themicroprocessor is temporarily "frozen" allowing the BCL 56 just enoughtime to access the external memory to process the pending loadinstruction or instructions, as the case may be. Such methods andapparatus for stalling a microprocessor are well known in the art ofmicroprocessor structure and technology and therefore are not describedherein as to not obscure the present invention. It is noted that thepresent invention may advantageously utilize any of the above well knownmicroprocessor stall or halt methods.

FIG. 5 illustrates the components of the present invention in moredetail. The elements of the Data Cache Unit 52 are illustrated in detailand an Instruction Decoder 200 is illustrated coupled to the DCU 52. TheBus Control Logic (BCL) 56 is also illustrated coupled to the DCU 52 andthe external memory 60. The BCL 56 is also a cache miss processing unit.The instruction decoder 200 is located within the microprocessor 101unit and is responsible for decoding the instructions executed by themicroprocessor and determines, among other results, that the currentinstruction is either a load instruction (via line 300) or a storeinstruction (via line 310). Although illustrated as separate from theData Cache Unit 52, the instruction decoder could be implemented withinthe DCU 52. Further, the DCU 52 could be implemented with a separate andspecial purpose (dedicated) instruction decoder for the purpose ofindicating a load or a store instruction. Such implementations wouldmerely be a matter of design selection and are within the spirit andscope of the present invention.

The instruction decoder 200 is coupled to the logic circuits of the datacache unit 52 via line 300 which indicates that the present instructionfrom the microprocessor instruction processor 50 is a load instruction.Line 310 indicates that the present instruction from the microprocessorinstruction processor 50 is a store instruction. The decode process ofthe present invention may be accomplished by instruction decoder 200 byany number of the well known instruction decode methods available andutilized by microprocessor technology. Typically each instruction has anassociated unique opcode to identify the instruction and instructiontype. The instruction decoder will analyze the opcode to indicate eithera load or a store instruction by asserting lines 300 and 310respectively. When the current instruction is decoded, the effectiveaddress of the data associated with the instruction is input over line212 to a comparator circuit 213. The comparator circuit of the datacache unit 52 is specially designed to cycle through each tag 209 of thedata cache memory unit 55 in order to locate an address match betweenthe effective address of the data 212 and one tag 209 of the cachememory 55. The comparator circuit 213 compares the upper bits of theeffective address of the data to the tag data in an effort to locate amatch. The effective address 212 is fed to one input of the comparator213 while the other input sequentially scans the tag entries 209 for amatch. If a match is not found, the present invention generates a misssignal 251 over the output of the compare logic 213. This indicates thatthe data address location associated with the present instruction is notwithin the cache memory unit 55 and must be accessed via the externalmemory 60 by the BCL. The BCL 56 must therefore re-execute the loadinstruction using external memory 60. This miss signal is fed into oneof the inputs of the logical AND gate 235. The load signal 300 is alsofed into the 235 gate as well as a cacheable signal 252 generated fromthe bus control unit 56.

Referring to FIG. 5, while the above occurs, the bus control unit 56analyzes the effective address of the current data address anddetermines if this address is cacheable or not, that is, the BCL 56determines if the cache memory 55 is able to have a correspondinglocation for the memory block associated with the effective address 212.If the effective address is not cacheable then the present inventiondoes not block the cache with regard this effective address.Cacheability does not mean that the cache memory 55 actually containsthe effective address location within the memory 55; rather,cacheability only refers the fact that the cache memory 55 is able toprovide accessing for the effective address of the data at some point intime. For instance, some portions of external memory will never have acorresponding cache memory location and these areas are not cacheableand as a result there is no need for the present invention to block thecache 55 with regard to these locations. In other words, there can be nocache data inconsistency associated with these memory addresses since nocache functions are performed. When a cacheable effective address 212 ispresent the cacheability signal 252 will be asserted. Therefore, the ANDgate 235 of the present invention will become asserted and generate asignal at its output when all three conditions are met: 1) there is acache miss associated with the present instruction; and 2) the presentinstruction is a load instruction; and finally 3) the load instructionhas an associated effective address that is cacheable.

Upon the above conditions (cacheable load with cache miss), the AND gate235 of the present invention will pulse to increment the load pendingcounter 225. The output of the AND gate 235 is coupled to the incrementinput of the load pending counter 225. The load pending counter 225keeps track of the number of the load miss instructions that are pendingbefore the bus control logic 56 for re-execution but have not yetcompletely executed. In other words, the load pending counter recordsthe number of load instructions on the queue 57. The load pendingcounter 225 is a two bit counter and counts 0, 1, 2, and 3; note thatthe load pending counter 225 cannot overflow because the maximum allowedoutstanding loads on the BCL queue 57 is three. A special output 261 ofthe load pending counter 225 is fed to one of the inputs of AND gate230. The output 261 indicates when the count of the counter is of anonzero value. It should be appreciated that upon the occurrence of amiss, the BCL 56 receives the load instruction and will re-execute theinstruction using external memory 60. If the BCL 56 already hasoutstanding loads to re-execute then the current load cache missinstruction will be delayed (i.e., placed on an empty slot of the queue57).

Assume that a cacheable store instruction is indicated by instructiondecoder 200 via line 310 being asserted; assume also that this conditionexists subsequent to a load cache miss that is outstanding. The output265 of AND gate 230 will generate the blocking signal which becomesasserted upon any subsequent cacheable store instructions which occurwhile an outstanding. cache miss load instruction is on the BCL queue 57as indicated by the load pending counter 225. The counter output 261 ofthe present invention is asserted whenever the load pending counter 225currently holds a count that is not equal to zero. This can beimplemented through logic by feeding the output bits of the two bitcounter into an OR gate and taking the OR gate output as signal 261.Thus, whenever one of the bits of the counter is nonzero the OR gatewill generate an asserted signal. The line 310 which indicates a storeinstruction is also fed into one of the inputs of the AND gate 230. Thelast input of the AND gate 230 receives the cacheable signal 252 whichindicates that the associated memory address with the currentinstruction may reside in cache memory 55. If line 252 were asserted thestore instruction would be cacheable.

Referring to FIG. 5, AND gate 230 of the present invention will generatea blocking signal at line 265 whenever a cacheable store instruction ispresent and the load pending counter 225 is nonzero indicating thatthere are outstanding load instructions (or a single outstanding loadinstruction) on the BCL queue having memory access operations to theexternal memory 60 that have not yet returned. The 265 signal is fedinto the interface 66 (of FIG. 4) and is received by the instructionprocessor 50 of the microprocessor 101. The 265 signal causes themicroprocessor to stall or halt current execution of the cacheable storeinstruction until each outstanding load miss instruction on the BCLqueue has been removed as indicated by the load pending counter 225. Ifa cache miss load instruction is encountered and the queue 57 is notfull, then the load instruction is placed onto the queue 57. Note thatqueue 57 is initially set to zero on system start-up.

The BCL 56 processes the outstanding load instructions stored on thequeue 57 in a first come first serve basis (i.e., first in first out).Each cache miss load instruction arriving at the BCL is placed by thepresent invention on an available location within the queue 57 and theBCL then re-executes the load instruction and processes the logicrequired to access the external memory 60 via the external bus 100. Atthe time the BCL 56 becomes aware of the cache miss load instruction andplaces the instruction and associated memory address on the queue 57,the load pending counter 225 is updated to indicate an outstanding load.While the first outstanding load is being processed other loadinstructions may be placed on the queue 57. When the required datareturns from the external memory 60 via bus 100, the BCL 56 indicates tothe data cache unit 52 that the data is available for immediate fillinto the cache memory 55 at a location determined by the effectiveaddress of the desired data and tagged as such by tag 209. The BCL 56indicates the cache fill condition by asserting signal 253 indicatingthat a load miss instruction has completed and also that the dataassociated with that instruction has been fetched from external memory60 and is currently available.

When fill signal 253 is indicated, the present invention updates thedata 205 of cache memory 55 by the data returning from the externalmemory 60 at the proper tag location. Also, the fill signal 253 is fedinto the decrement input of load pending counter 225 to decrement thecount of the cache miss load instructions pending. This is so becausewhen the fill condition is entered the currently processed cache missload instruction (from the BCL) is finished processing and is thereforeno longer on the queue 57 and therefore one less outstanding load ispresent. Therefore, the present invention increments the two bit loadpending counter 225 each time a cacheable load miss occurs anddecrements the load pending counter 225 each time a fill condition isindicated by the BCL 56 which represents the completion of anoutstanding load instruction. It is appreciated that other interfacesignals 271, not necessary for the understanding of the presentinvention, are present between the BCL 56 and the DCU 52. These signalsare not explained in detail as to no unnecessarily obscure theadvantageous aspects of the present invention.

It is appreciated that a blocking signal is generated by the presentinvention also when a load instruction is issued but there are alreadythe maximum number (3) of outstanding load miss instructions on the BCLqueue 57. It is also appreciated that a number of embodiments may beemployed to accomplish this task. One embodiment of the presentinvention is to have the BCL 52 indicate a signal when the queue 57 isfull and have the data cache unit 52 respond to this signal wheneveranother cache miss load instruction is encountered to block theinstruction processor 50. An other embodiment, the preferred embodimentof the present invention, as illustrated by FIG. 5 is implemented withlogic on the DCU 52 and utilizes the load pending counter 225. In moredetail, AND gate 220 receives input from signal 252 to indicate that thecurrent instruction is cacheable. The AND gate 220 also receives aninput from signal 300 to indicate that a load instruction has beenencountered by the instruction decoder 200. Further, the last input tothe AND gate 220 is signal 263 originating from the load pending counter225.

Signal 263 indicates when the load pending counter 225 currently holds acount of 3 indicating that the queue 57 is currently full. This signalmay be implemented by checking both bits of the load pending counter 225and supplying them to an AND gate which will assert and output when bothbits are logical "1" which indicates a three in binary. Gate 220 willgenerate a block signal on line 221 whenever the counter holds a valueof three and a cacheable load instruction is encountered. Signal 221,like signal 265 is fed to the instruction processor 50 via line 66 andinstructs the microprocessor to stall or halt execution of the currentload instruction. The microprocessor 50 must halt on this conditionbecause no further load miss instructions can be placed on the BCL queue57 until a slot is opened. When the first outstanding load instructionon the BCL queue 57 is complete, a cache fill instruction will begenerated over line 253 from the BCL 56 and the cache memory 55 of thepresent invention will be updated with the data from the external memory60. The load pending counter 225 will then be decremented by signal 253and the blocking signal at 221 will no longer be asserted by AND gate220. At this time, the stalled load miss instruction will be executed bythe present invention. If the load instruction generates a cache missthen as described above, the present invention will placed it onto thequeue 57 of the BCL 56 and the load pending counter 225 will beincremented.

It is appreciated that the blocking signals generated at line 221 andline 265 are temporary. Blocking signal at line 265 is deasserted uponthe count of the load pending counter 225 reaching zero. Also theblocking signal at line 221 is deasserted upon the counter reaching avalue of 0, 1, or 2. As can be seen from the above, a partially blockingdata cache system has been described which is implemented with a two bitload pending counter, a halt signal and relatively modest externalimplementation logic. The advantage of this type of circuitry is thateasy modifications of existing data cache units is possible using theadvanced technology offered by the present invention. The presentinvention insures data integrity and consistency by blocking storeinstructions, with signal 265, that follow outstanding cache miss loadinstructions. Further, the present invention is efficient in that itallows multiple cacheable load instructions to execute within the cachememory 55 following a load instruction having a cache miss.

FIG. 5 also illustrates the Bus Control Logic 56 (BCL) which is alsocalled the Bus Control Unit. As shown, the BCL 56 interfaces with theDCU 52 via miss signal 251, the cacheable signal 252, the cache fillsignal 253 and various other interface signals 271. Within the BCL 56 isan instruction queue 57 which stores the outstanding cache missinstructions (loads) that are still pending execution and access to theexternal memory 60. The BCL 56 re-executes each instruction within thequeue 57 in so far as these instructions were first attempted to beexecuted in the DCU 52 but a cache miss was generated by comparator 213.The BCL 56 executes each instruction in the queue 57 in a first in firstout sequence by accessing the external memory 60 to retrieve the dataassociated with the effective address of these instructions. Onceexecution is complete by the BCL 56, the fill instruction is assertedallowing the data to be put into the cache memory 55. The counter 255 isthen decremented and the queue 57 is updated (i.e., each entry is movedup in line and a vacant entry is available at the end of the queue). Asshown, the queue 57 is only three instructions deep and can hold onlythree pending cache miss instructions. Not all locations of the externalmemory are cacheable. Those locations not cacheable are stored in adirectory table within the BCL 56. When the effective address of aninstruction 212 corresponds to a noncacheable location within theexternal memory, as indicated by the directory, the cacheable signal 252is deasserted. Likewise, when the effective address of an instruction212 corresponds to a cacheable location within the external memory, asindicated by the directory, the cacheable signal 252 is asserted by theBCL 56.

Operation of the Preferred Embodiment of the Present Invention

The major functions of the preferred embodiment of the present inventionoperate according the flow diagram illustrated in the flow diagram ofFIG. 6. The present invention acts to stall the microprocessor'sinstruction processor whenever: (1) a store instruction is encounteredand there are still outstanding cacheable load miss instructions on theBCL queue 57 which have not yet been fully executed by the BCL 56; or(2) a load instruction is encountered but there are already threeoutstanding cacheable load miss instructions on the BCL queue 57 whichhave not yet been fully executed by the BCL 56. An outstanding loadinstruction on the queue will not be fully executed until the BCL 56receives the required data from the external memory 60 and issues acache fill signal 253. To allow a store instruction to execute beforethe data associated with an outstanding load is retrieved from externalmemory may cause desired data to become overwritten before the loadinstruction is processed or the load may overwrite desired store data inthe cache memory 55. The status and contents of the BCL queue 57 aredetermined by a load pending counter 225 within the DCU 52 thatindicates the number of outstanding cacheable load miss instructionsthat are located on the queue 57.

As shown in FIG. 6, at block 600 the functions of the present inventionbegin. The instruction processor 50 presents the next microprocessorinstruction at block 610. The present invention data cache unit onlyprocesses those instructions having associated cacheable memoryaddresses because those instructions with noncacheable data addresses donot create data inconsistencies between external memory 60 and cachememory 55. Therefore, throughout this discussion it is assumed thatprocessed instructions involve cacheable memory addresses. Theinstruction decoder 200 decodes the current microprocessor instructionfrom the instruction processor 50. The instruction decoded result istested at blocks 615 and block 620 to determine the instruction type.Block 615 of the present invention tests whether the instruction was aload instruction by examining the opcode and indicates the result of thetest by employing signal 300 (of FIG. 5). If the instruction is a loadinstruction (and assumed cacheable) then the present invention checks tosee if the address of the requested data is found within the cachememory 55 at Block 630. The selected bits of the effective address 212of the data associated with the cacheable load instruction are comparedto each of the tags 209 of the cache memory 55 in order to determine ifthe data resides within the data 205 of the cache memory 55. Thecomparison is accomplished via comparator 213. If the desired address212 is found (a cache hit) within the cache memory 55 then the data 205is read from the cache memory 55 and the load instruction may executecompletely from the cache memory (and DCU 52) without access to theexternal memory and without re-execution by the BCL 56; the cache hitload instruction is processed through the cache at block 695. In thiscase, no miss was found and the processing returns to block 610 toreceive another instruction from the decoder after processing of block695.

It is appreciated that block 695 increases processor speed andefficiency by allowing cacheable load instructions to operate via thecache memory 55 while preceding outstanding load instructions which havecaused cache misses are still within the BCL 56. During the delay periodwhile data is being accessed from external memory for these outstandingload instructions the present invention allows the microprocessor 101 toexecute certain load instructions through the cache memory 55 via block695. For instance, even if two preceding cache miss instructions were onqueue 57, block 695 is still allowed to process cache hit loadinstructions while the above cache miss instructions remain pending inthe BCL 56.

If the comparison unit 213 does not indicate that the desired memoryaddress 212 is present within the cache memory 55, then a miss conditionoccurs which is signaled by the comparison unit 213 over line 251. Inthis case the address must be retrieved from the external memory 60 andthe load instruction must be re-executed by the BCL 56. The presentinvention proceeds to block 635 and the load pending counter 225 ischecked or the BCL queue 57 is checked to see if there are threeoutstanding cache miss load instructions already. If there are less thanthree outstanding load instructions on the queue then the load pendingcounter 225 is incremented at block 640 by asserting the output of ANDgate 235 and the current load instruction (which generated the mostrecent cache miss) is placed at the end of the BCL queue 57 at block 650so that the BCL 56 may sequentially process this load instruction (intime) and obtain the required data from the external memory 60. Afterthe load instruction is placed on the queue 57 the present inventionreturns to block 610 to process the next instruction from themicroprocessor instruction processor 50 which is decoded by instructiondecoder 200. It should be noted that if no outstanding load instructionsare within the queue 57 upon the current cache miss load instruction theBCL 56 may re-execute the load instruction at once. While the BCL 56 isexecuting this load instruction, subsequent cache miss load instructionswill be delayed (i.e., placed on the queue) until the presentlyprocessed instruction is completed by the BCL 56.

Referring still to FIG. 6, if there are already three outstanding loadinstructions waiting on the BCL queue then no more load instructions maybe placed on the BCL queue. Block 635 tests if there are already threepending load instructions, and if so, the present invention goes toblock 655. At block 655 the present invention stalls the current loadinstruction until the first load instruction on the BCL queue iscompletely executed and the external memory is accessed; during thisperiod the cache is blocked and signal 221 is asserted. When the firstoutstanding instruction is completely executed, the BCL asserts a cachefill signal 253 at block 660 to indicate to the data cache unit 52 thatthe data associated with the first outstanding load instruction may beplaced or replaced within the data cache memory 55. The cache memory 55is then updated accordingly and the first outstanding load instructionis executed. At block 660 the first outstanding load instruction takenoff of the BCL queue by the present invention and a space on the queueis thereby made available. The load pending counter 225 is decrementedat block 665 by fill signal 253 to indicate that an outstanding load hasbeen taken off of the BCL queue. The processing then continues to block640 where the current load instruction is processed. Since it has beendetermined that the current load instruction is a cache miss, the loadpending counter 225 is incremented and the current load instruction isthe placed into the BCL queue 57 into the last or third position byblock 650. Processing then returns to process the next instruction atblock 610.

At block 615 if the instruction decoded by the instruction decoder 200is not a load instruction the processing continues to block 620 wherethe present invention checks if a signal on line 310 indicates that thecurrent instruction is a store instruction. If the instruction is not astore instruction and not a load instruction, then the present inventionis not applicable to the situation and the present invention returns atblock 625 to end current processing and the flow enters again at block610 to process another microprocessor instruction. If the currentinstruction is a store instruction then processing continues from block620 to block 670. At block 670 the present invention checks to see ifthe load pending counter 225 is nonzero. If the counter 225 is zero thenno cacheable loads are outstanding on the BCL queue 57 and the cacheablestore instruction may be executed by block 675; in this case the cacheis not blocked. Next, the processing continues to block 610 for the nextinstruction.

Referring to FIG. 6, if the counter is nonzero at block 670 then atleast one outstanding load instruction with a cache miss is pending.According to the present invention, the store instruction must not beallowed to execute with the cache while outstanding load instructionsare pending in order to maintain data consistency. The present inventioncannot allow data to be written into main memory by a store instructionbecause such data may overwrite desired data of a preceding andincomplete load instruction. Furthermore, the present invention cannotallow a store instruction to write data into the cache because apreceding yet incomplete and outstanding load instruction may overwritethe store data upon execution of the load instruction. Therefore, atblock 680 the DCU issues a command, via signal 221, to block executionof the current store instruction. At this point the data cache memory 55is blocked by the present invention and the present invention stalls themicroprocessor instruction processor 50. The signal is asserted at line221 and driven to the microprocessor instruction processor 50 until theload pending counter 225 is decremented to zero. Upon the completion ofeach outstanding load instruction at block 685 a fill code is generatedover line 253 so that the data retrieved from the external memory 60associated with the outstanding load can be placed or replaced into thedata cache memory 55.

For each outstanding load that is fully executed by the BCL 56 and uponthe BCL retrieving its associated data from the external memory 60, theload pending counter 225 is decremented and the instruction is removedfrom the BCL queue 57; this occurs at block 690 when the fill signal isasserted. The present invention then cycles to block 670. Eventually,all of the outstanding loads will have completed and the load pendingcounter 225 will be decremented to zero. At this time, block 670 willindicate that the load pending counter 225 is zero and the current storeinstruction will no longer be stalled by the present invention and theblocking pulse at 221 will not be asserted. The microprocessor istherefore no longer stalled and free to execute the store instruction.Then the store instruction may be processed and data may be stored inthe cache or in the external memory 60 at block 675. After the storeinstruction is processed the present invention returns to block 610 forthe next instruction from the microprocessor.

It can be seen from the above that some instructions are suspended fromexecution by a blocking signal generated at lines 265 and 221 and otherinstructions are merely delayed from execution by being placed onto theBCL queue 57. Load instructions whose data is found in the cache maygenerally execute during the period of an outstanding load instructionthat generated a cache miss. Load instructions whose data is not foundin the cache are placed onto the BCL queue 57 for eventual sequentialexecution by the BCL 56. Store instructions following an outstandingload instruction in the BCL that generated a cache miss will besuspended from execution until all outstanding load instructions havebeen processed. Lastly, if the BCL queue 57 is filled then a subsequentload instruction will be blocked (suspended). As can be seen,the-present invention data cache unit advantageously processessuccessive loads even though a previous load miss was encountered.

It is appreciated that the present invention operates advantageouslywithin the 80960CF microprocessor available from Intel Corporation ofSanta Clara, Calif. and will operate equally advantageously with anymicroprocessor and data cache unit having similar characteristics. Thepresent invention operates advantageously within the abovemicroprocessor because of the large number of registers contained withinthis microprocessor. As a result of the large number of registers,relatively few memory stores are required as a percentage of the overallinstructions executed because registers can be used for temporarystorage. Since multiple load misses can be accepted without a blockingsignal issuance, the most frequent occurrence of a microprocessor stallis when a store instruction follows an outstanding load instruction thatmissed the cache. Therefore, since the memory store instructions areexecuted relatively infrequently, the overall microprocessor performanceis not degraded as a result of the stall signal. Furthermore, processingefficiency is improved using the present invention by allowing executionof multiple load instructions after the occurrence of a load instructioncache miss.

The preferred embodiment of the present invention, a partially blockingdata cache memory and data cache unit implemented with relatively modesthardware modifications over existing cache designs that blockssubsequent store instructions if there are outstanding load misses whileallowing successive loads through the cache even though a pervious loadmiss was encountered is thus described. While the present invention hasbeen described in one particular embodiment, it should be appreciatedthat the present invention should not be construed as limited by suchembodiment, but rather construed according to the below claims.

What is claimed is:
 1. A cache memory apparatus for use with amicroprocessor that processes a plurality of instructions havingassociated data, said apparatus comprising:cache memory means forproviding high speed memory cache operations with said microprocessor;miss indicating means for generating a miss indication associated witheach instruction of said plurality of instructions having an associateddata address which is not accessible by said cache memory means, saideach instruction called a cache miss instruction; and selecting meansfor temporarily blocking said microprocessor from processing a firstinstruction type of said plurality of instructions that follows saidmiss indication from said miss indicating means, said selecting meansalso for allowing processing of a second instruction type of saidplurality of instructions that follows said miss indication from saidmiss indicating means, said selecting means communicatively coupled tosaid miss indicating means.
 2. A cache memory apparatus as described inclaim 1 wherein said first instruction type is a store instruction type.3. A cache memory apparatus as described in claim 2 wherein said secondinstruction type is a load instruction type.
 4. A cache memory apparatusas described in claim 1 further comprising:external memory means forstorage and retrieval of information compatible with said cache memorymeans; and cache miss processing means for processing each cache missinstruction by accessing said external memory means.
 5. A cache memoryapparatus as described in claim 4 further comprising means for signalingwhether said cache miss processing means is required to process anycache miss instruction.
 6. A cache memory apparatus as described inclaim 5 wherein said means for signaling comprises counting means forincrementing on each of said miss indication generated by said missindication means and said counting means for decrementing upon executioncompletion of each of said cache miss instruction processed by saidcache miss processing means.
 7. A cache memory apparatus as described inclaim 4 wherein said selecting means temporarily blocks saidmicroprocessor from executing said first instruction type while saidcache miss processing means is required to process any cache missinstruction
 8. A cache memory apparatus as described in claim 1 furthercomprising:external memory means for storage and retrieval ofinformation compatible with said cache memory means; and cache missprocessing means for processing said each cache miss instruction byaccessing said external memory means, said cache miss processing meansfurther comprising a queue for storing pending cache miss instructions.9. A cache memory apparatus as described in claim 8 wherein said firstinstruction type is a load instruction and wherein said selecting meansblocks said microprocessor from processing said load instruction if saidqueue is full and wherein said second instruction type is also a loadinstruction and said selecting means does not block said microprocessorfrom processing said load instruction if said queue is not full.
 10. Acache memory apparatus as described in claim 6 wherein said cache memorymeans comprises a data cache memory array of approximately 1k bytes andwherein said counting means is a two bit counter.
 11. A cache apparatusfor use with a microprocessor, said microprocessor processing aplurality of instructions each having associated data, said apparatuscomprising:cache memory means for high speed data storage and retrievalfor said microprocessor; first processing means for executing cache missinstructions each having an associated data address that is notaccessible by said cache memory means; and stalling means fortemporarily preventing said microprocessor from executing a firstinstruction type while one or more of said cache miss instructions arepending before said first processing means, said stalling means also forallowing said microprocessor to execute a second instruction type whileone or more of said cache miss instructions are pending before saidfirst processing means.
 12. A cache apparatus as described in claim 11further comprising storage means for indicating if any of said cachemiss instructions are pending before said first processing means, saidstorage means communicatively coupled to said stalling means and alsocommunicatively coupled to said first processing means.
 13. A cacheapparatus as described in claim 11 wherein said first instruction typeis a store instruction.
 14. A cache apparatus as described in claim 11wherein said second instruction type is a load instruction.
 15. A cacheapparatus as described in claim 11 wherein said first instruction typeis a store instruction and wherein said second instruction type is aload instruction.
 16. A cache apparatus as described in claim 13 whereinsaid stalling means temporarily prevents said microprocessor fromexecuting said store instruction until said storage means indicates thatno cache miss instructions are pending before said first processingmeans.
 17. A cache apparatus as described in claim 15 wherein saidstalling means temporarily prevents said microprocessor from executingsaid store instruction until said storage means indicates that no cachemiss load instructions are pending before said first processing means.18. A cache apparatus as described in claim 15 wherein said stallingmeans temporarily prevents said microprocessor from executing said loadinstruction until said storage means indicates that less than a maximumnumber of cache miss load instructions are pending before said firstprocessing means.
 19. A cache apparatus as described in claim 12 whereinsaid first processing means comprises completion means for indicating tosaid storage means that a cache miss instruction is completely processedby said first processing means and is therefore no longer pending beforesaid first processing means.
 20. A cache apparatus as described in claim12 further comprising:miss indicating means for generating a missindication associated with each microprocessor instruction having anassociated data address which is not accessible by said cache memorymeans, said miss indicating means coupled to said storage means.
 21. Acache apparatus as described in claim 19 further comprising:missindicating means for generating a miss indication associated with eachmicroprocessor instruction having an associated data address which isnot accessible by said cache memory means, said miss indicating meanscoupled to said storage means.
 22. A cache apparatus as described inclaim 18 wherein said maximum number of cache miss load instructions isthree.
 23. A cache apparatus as described in claim 12 wherein saidstorage means is a logical counter having an increment input and adecrement input.
 24. A cache apparatus as described in claim 21wherein:said cache memory means is a logical memory array; said storagemeans is a logical counter having an increment input and a decrementinput; said miss indicating means increments said logical counter uponeach miss indication, said miss indicating means coupled to saidincrement input of said logical counter; and wherein said completionmeans decrements said logical counter upon each cache miss instructionthat is completely processed by said first processing means, saidcompletion means coupled to said decrement input of said logicalcounter.
 25. A cache apparatus for use with a microprocessor, saidmicroprocessor processing a plurality of instructions each havingassociated data, said apparatus comprising:cache memory army forproviding high speed memory cache operations with said microprocessor;first processing means for executing preceding cache miss loadinstructions each having an associated data address that is notaccessible by said cache memory array; and stalling means fortemporarily preventing said microprocessor from executing a subsequentstore instruction while one or more of said preceding cache miss loadinstructions are pending before said first processing means, saidstalling means also for allowing said microprocessor to execute asubsequent load instruction while less than a predetermined number ofsaid preceding cache miss load instructions are pending before saidfirst processing means.
 26. A cache apparatus as described in claim 25further comprising:storage means for indicating if any of said precedingcache miss load instructions are pending before said first processingmeans, said storage means communicatively coupled to said stalling meansand also communicatively coupled to said first processing means; andmiss indicating means for generating a miss indication associated witheach cache miss load instruction, said miss indicating means coupled tosaid storage means.
 27. A cache apparatus as described in claim 26wherein said first processing means comprises completion means forindicating to said storage means that a cache miss load instruction iscompletely processed by said first processing means and is therefore nolonger pending before said first processing means.
 28. A cache apparatusas described in claim 27 wherein:said storage means is a logical counterhaving an increment input and a decrement input; said miss indicatingmeans increments said logical counter upon each miss indication, saidmiss indicating means coupled to said increment input of said logicalcounter; and said completion means decrements said logical counter uponeach cache miss load instruction that is completely processed by saidfirst processing means, said completion means coupled to said decrementinput of said logical counter.
 29. A cache apparatus for use with amicroprocessor, said microprocessor processing a plurality ofinstructions each having associated data, said apparatuscomprising:cache memory array for high speed memory operations with saidmicroprocessor; bus control logic executing cache miss instructions eachhaving an associated data address that is not accessible by said cachememory array, said bus control logic further comprising a queue forcontaining any of said cache miss instructions that are pending beforesaid bus control logic; and gating logic asserting a stall signal totemporarily prevent said microprocessor from executing a firstinstruction type while one or more of said cache miss instructions arepending before said bus control logic, said gating logic deassertingsaid stall signal to allow said microprocessor to execute a secondinstruction type while one or more of said cache miss instructions arepending before said bus control logic.
 30. A cache apparatus asdescribed in claim 29 further comprising a logical counter forindicating the number of said cache miss instructions that are pendingbefore said bus control logic, said logical counter communicativelycoupled to said gating logic and also communicatively coupled to saidbus control logic.
 31. A cache apparatus as described in claim 29wherein said first instruction type is a store instruction.
 32. A cacheapparatus as described in claim 31 wherein said second instruction typeis a load instruction.
 33. A cache apparatus as described in claim 32wherein said gating logic temporarily prevents said microprocessor fromexecuting said store instruction until said logical counter indicatesthat no cache miss instructions are pending before said bus controllogic.
 34. A cache apparatus as described in claim 32 wherein saidgating logic temporarily prevents said microprocessor from executingsaid load instruction until said logical counter indicates that lessthan a maximum number of cache miss load instructions are pending beforesaid bus control logic.
 35. A cache apparatus as described in claim 32wherein said bus control logic asserts a digital fill signal indicatingto said logical counter that a cache miss instruction is completelyprocessed by said bus control logic and is therefore no longer pending.36. A cache apparatus as described in claim 35 furthercomprising:comparator logic asserting a miss signal associated with eachmicroprocessor instruction having an associated data address which isnot accessible by said cache memory array, said comparator logic coupledto said logic counter.
 37. A cache apparatus as described in claim 32wherein said logical counter includes an increment input and a decrementinput.
 38. A cache apparatus as described in claim 36 wherein:saidlogical counter includes an increment input and a decrement input; saidmiss signal of said comparator logic increments said logical counterupon each miss indication, said miss signal communicatively coupled tosaid increment input of said logical counter; and said digital fillsignal of said bus control logic decrements said logical counter uponcompletion of each pending cache miss instruction, said digital fillsignal communicatively coupled to decrement input of said logicalcounter.
 39. A method for partial execution blocking a microprocessor bya cache unit, said microprocessor for processing a plurality ofinstructions each having associated data, said method comprising thesteps of:providing a cache memory array for providing high speed memorycache operations with said microprocessor; performing a first executionstep by executing cache miss instructions each having an associated dataaddress that is not accessible by said cache memory array; temporarilypreventing said microprocessor from executing a first instruction typewhile one or more of said cache miss instructions are pending beforesaid first execution step; and allowing said microprocessor to execute asecond instruction type while less than a predetermined number of saidcache miss instructions am pending before said first execution step. 40.A method as described in claim 39 further comprising the step ofindicating a number of said cache miss instructions that are pendingbefore said first execution step.
 41. A method as described in claim 40wherein said first instruction type is a store instruction.
 42. A methodas described in claim 41 wherein said second instruction type is a loadinstruction.
 43. A method as described in claim 42 wherein said step oftemporarily preventing said microprocessor from executing said storeinstruction operates until said step of indicating indicates that nocache miss load instructions are pending before said first executionstep.
 44. A method as described in claim 42 wherein said step ofallowing said microprocessor to execute a second instruction typeoperates until said step of indicating indicates that a maximum numberof cache miss load instructions are pending before said first executionstep.
 45. A method as described in claim 40 further comprising the stepof asserting a signal each time a cache miss instruction is completelyprocessed by said first execution step so that said step of indicatingmay update said number of said cache miss instructions that are pendingbefore said first execution step.
 46. A method as described in claim 40further comprising the step of:generating a miss indication signalassociated with each microprocessor instruction having an associateddata address which is not accessible by said cache memory array so thatsaid step of indicating may update said number of said cache missinstructions that are pending before said first execution step.
 47. Amethod as described in claim 46 further comprising the stepof:generating a miss indication signal associated with eachmicroprocessor instruction having an associated data address which isnot accessible by said cache memory array so that said step ofindicating may update said number of said cache miss instructions thatare pending before said first execution step.
 48. A method as describedin claim 47 wherein said step of indicating is implemented via a logicalcounter.
 49. A computer system comprising:a) a bus for providing acommon communication pathway; a processor for executing a plurality ofinstructions each having associated data, said processor coupled to saidbus; memory storage unit for storage and retrieval of said data, saidmemory storage unit coupled to said bus; a display for display of saiddata; a data input device for inputting data to said computer system;and b) a cache unit for interfacing cache memory with said processor,said cache unit comprising:a cache memory array for providing high speedmemory cache operations with said processor; bus control logic forexecuting cache miss instructions each having an associated data addressthat is not accessible by said cache memory array, said bus controllogic coupled to said bus; and stalling circuitry for temporarilypreventing said processor from executing a first instruction type whileone or more of said cache miss instructions are pending before said buscontrol logic, said stalling circuitry also for allowing said processorto execute a second instruction type while one or more of said cachemiss instructions are pending before said bus control logic.
 50. Acomputer system including a cache unit as described in claim 49 furthercomprising a storage device for indicating if any of said cache missinstructions are pending before said bus control logic, said storagedevice communicatively coupled to said stalling circuitry and alsocommunicatively coupled to said bus control logic.
 51. A computer systemincluding a cache unit as described in claim 49 wherein said firstinstruction type is a store instruction.
 52. A computer system includinga cache unit as described in claim 49 wherein said second instructiontype is a load instruction.
 53. A computer system including a cache unitas described in claim 49 wherein said first instruction type is a storeinstruction and wherein said second instruction type is a loadinstruction.
 54. A computer system including a cache unit as describedin claim 51 wherein said stalling circuitry temporarily prevents saidprocessor from executing said store instruction until said storagedevice indicates that no cache miss instructions are pending before saidbus control logic.
 55. A computer system including a cache unit asdescribed in claim 53 wherein said stalling circuitry temporarilyprevents said processor from executing said store instruction until saidstorage device indicates that no cache miss load instructions arepending before said bus control logic.
 56. A computer system including acache unit as described in claim 53 wherein said stalling circuitrytemporarily prevents said processor from executing said load instructionuntil said storage device indicates that less than a maximum number ofcache miss load instructions are pending before said bus control logic.57. A computer system including a cache unit as described in claim 50wherein said bus control logic comprises completion circuitry forindicating to said storage device that a cache miss instruction iscompletely processed by said bus control logic and is therefore nolonger pending before said bus control logic.
 58. A computer systemincluding a cache unit as described in claim 50 further comprising:missindicating circuitry for generating a miss indication associated witheach instruction having an associated data address which is notaccessible by said cache memory array, said miss indicating circuitrycoupled to said storage device.
 59. A computer system including a cacheunit as described in claim 57 further comprising:miss indicatingcircuitry for generating a miss indication associated with eachinstruction having an associated data address which is not accessible bysaid cache memory army, said miss indicating circuitry coupled to saidstorage device.
 60. A computer system including a cache unit asdescribed in claim 56 wherein said maximum number of cache miss loadinstructions is three.
 61. A computer system including a cache unit asdescribed in claim 50 wherein said storage device is a logical counterhaving an increment input and a decrement input.
 62. A computer systemincluding a cache unit as described in claim 59 wherein:said storagedevice is a logical counter having an increment input and a decrementinput; said miss indicating circuitry includes circuitry to incrementsaid logical counter upon each miss indication, said miss indicatingcircuitry coupled to said increment input of said logical counter; andsaid completion circuitry includes circuitry to decrement said logicalcounter upon each cache miss instruction that is completely processed bysaid bus control logic, said completion circuitry coupled to saiddecrement input of said logical counter.
 63. A cache logic apparatus forinterfacing a processor and a cache memory array of said cache logicapparatus, said processor sequentially processing a plurality ofinstructions each having associated data, said apparatus comprising:acache memory array for providing high speed memory cache operations withsaid microprocessor, said cache memory army communicatively coupled withsaid microprocessor; first processing logic for sequentially executingpreceding cache miss load instructions each having an associated dataaddress that is not accessible by said cache memory array; and stallingcircuitry for temporarily preventing said processor from executing asubsequent store instruction while one or more of said preceding cachemiss load instructions are pending before said first processing logicbut not yet completely processed by said first processing logic, saidstalling circuitry also for allowing other of said plurality ofinstructions to execute, said other of said plurality of instructionsbeing other than store instructions.